580 research outputs found
A clustering algorithm for multivariate data streams with correlated components
Common clustering algorithms require multiple scans of all the data to
achieve convergence, and this is prohibitive when large databases, with data
arriving in streams, must be processed. Some algorithms to extend the popular
K-means method to the analysis of streaming data are present in literature
since 1998 (Bradley et al. in Scaling clustering algorithms to large databases.
In: KDD. p. 9-15, 1998; O'Callaghan et al. in Streaming-data algorithms for
high-quality clustering. In: Proceedings of IEEE international conference on
data engineering. p. 685, 2001), based on the memorization and recursive update
of a small number of summary statistics, but they either don't take into
account the specific variability of the clusters, or assume that the random
vectors which are processed and grouped have uncorrelated components.
Unfortunately this is not the case in many practical situations. We here
propose a new algorithm to process data streams, with data having correlated
components and coming from clusters with different covariance matrices. Such
covariance matrices are estimated via an optimal double shrinkage method, which
provides positive definite estimates even in presence of a few data points, or
of data having components with small variance. This is needed to invert the
matrices and compute the Mahalanobis distances that we use for the data
assignment to the clusters. We also estimate the total number of clusters from
the data.Comment: title changed, rewritte
Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach
Regularizing Deep Neural Networks (DNNs) is essential for improving
generalizability and preventing overfitting. Fixed penalty methods, though
common, lack adaptability and suffer from hyperparameter sensitivity. In this
paper, we propose a novel approach to DNN regularization by framing the
training process as a constrained optimization problem. Where the data fidelity
term is the minimization objective and the regularization terms serve as
constraints. Then, we employ the Stochastic Augmented Lagrangian (SAL) method
to achieve a more flexible and efficient regularization mechanism. Our approach
extends beyond black-box regularization, demonstrating significant improvements
in white-box models, where weights are often subject to hard constraints to
ensure interpretability. Experimental results on image-based classification on
MNIST, CIFAR10, and CIFAR100 datasets validate the effectiveness of our
approach. SAL consistently achieves higher Accuracy while also achieving better
constraint satisfaction, thus showcasing its potential for optimizing DNNs
under constrained settings
From Noisy Point Clouds to Complete Ear Shapes: Unsupervised Pipeline
Funding Information: This work was supported in part by the European Union’s Horizon 2020 Research And Innovation Programme through the Marie Skłodowska-Curie Project BIGMATH, under Agreement 812912, and in part by the Eureka Eurostars under Project E!11439 FacePrint. The work of Cláudia Soares was supported in part by the Strategic Project NOVA LINCS under Grant UIDB/04516/2020. Funding Information: This work was supported in part by the European Union's Horizon 2020 Research And Innovation Programme through the Marie Skiodowska-Curie Project BIGMATH, under Agreement 812912, and in part by the Eureka Eurostars under Project E11439 FacePrint. The work of Claudia Soares was supported in part by the Strategic Project NOVA LINCS under Grant UIDB/04516/2020. Publisher Copyright: © 2013 IEEE.Ears are a particularly difficult region of the human face to model, not only due to the non-rigid deformations existing between shapes but also to the challenges in processing the retrieved data. The first step towards obtaining a good model is to have complete scans in correspondence, but these usually present a higher amount of occlusions, noise and outliers when compared to most face regions, thus requiring a specific procedure. Therefore, we propose a complete pipeline taking as input unordered 3D point clouds with the aforementioned problems, and producing as output a dataset in correspondence, with completion of the missing data. We provide a comparison of several state-of-the-art registration and shape completion methods, concluding on the best choice for each of the steps.publishersversionpublishe
Cost Optimization of Ice Distribution
Two questions regarding minimizing fuel costs while delivering ice along a pre-set route are tackled.
The first question is when demand exceeds the load of a single truck, so that a second truck of ice has to be taken to some point of the route for the driver/salesman to continue with that for the rest of the route: Is it better:
1) for the first truck to deliver starting from the costumer nearest to the base, or
2) for the first truck to start the delivery from the last costumer (the most distant from the base)?
We show that the second strategy was better for the particular data looked at, and we have the basis of an algorithm for deciding which strategy is the better for a given delivery schedule.
The second question concerns how best to modify a regular sales route when an extra delivery has to be made. Again, the basis for an algorithm to decide how to minimize fuel costs is derived
- …